feat: Integrate coolbpf cpu profiling feature in loongcollector#2391
feat: Integrate coolbpf cpu profiling feature in loongcollector#2391wokron wants to merge 99 commits into
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR integrates coolbpf CPU profiling capabilities into loongcollector by adding a new input_cpu_profiling plugin. The implementation enables continuous CPU profiling of specified processes through command-line pattern matching and container discovery.
Key changes:
- New CPU profiling plugin with process discovery mechanism
- Integration with coolbpf profiler library
- Plugin registration and lifecycle management
Reviewed Changes
Copilot reviewed 39 out of 39 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| core/plugin/input/InputCpuProfiling.{h,cpp} | New plugin implementation for CPU profiling input |
| core/ebpf/plugin/cpu_profiling/* | Core CPU profiling manager and process discovery logic |
| core/ebpf/driver/CpuProfiler.h | Wrapper for coolbpf profiler library integration |
| core/ebpf/Config.{h,cpp} | CPU profiling configuration option handling |
| core/ebpf/include/export.h | Type definitions for CPU profiling |
| core/unittest/input/InputCpuProfilingUnittest.cpp | Unit tests for the input plugin |
| core/unittest/ebpf/*Unittest.cpp | Unit tests for CPU profiling components |
| Various CMakeLists.txt | Build system updates for new components |
Comments suppressed due to low confidence (1)
core/unittest/input/InputCpuProfilingUnittest.cpp:1
- Corrected spelling of 'CommandLines' to 'CommandLines' in comment context.
| mEBPFAdapter->UpdatePlugin(PluginType::CPU_PROFILING, | ||
| buildCpuProfilingConfig(std::move(totalPids), std::nullopt, nullptr, nullptr)); |
There was a problem hiding this comment.
Passing null handler and context to buildCpuProfilingConfig during update will overwrite the valid handler set during initialization, breaking profiling event handling.
There was a problem hiding this comment.
看了代码,这是false-positive,但建议加一下注释,说明只能更新pid,不能更新其他,或者单独弄一个buildUpdateCpuProfilingConfig
ecf5b83 to
a3a8e56
Compare
c00d080 to
0f58800
Compare
2a5948d to
6c51221
Compare
|
使用文档仿照其他插件补充一下 |
| std::lock_guard<std::mutex> lock(mMutex); | ||
| if (mProfiler == nullptr) { | ||
| livetrace_enable_tracing(); | ||
| mProfiler = livetrace_profiler_create(); |
There was a problem hiding this comment.
有没有控制队列大小等资源相关的参数,如何约束其资源使用量呢
There was a problem hiding this comment.
队列大小的控制是 coolbpf 侧提供的。其中根据 profile 周期确定了有界队列的大小。https://gitee.com/anolis/coolbpf/blob/master/src/profiler/src/probes/probes.rs#L274
There was a problem hiding this comment.
看了下,coolbpf 侧没有根据 profile 周期设置有界队列。队列是 unbounded,理论上如果消费者(LoongCollector Poll)跟不上生产速度,channel 会无限增长。队列大小控制完全不在 coolbpf 这一侧。
There was a problem hiding this comment.
链接给错行数了 :( ,应该是下面 284 行
let ms = profile_period() as usize;
let sample_per_sec = 1000 / ms;
let ten_sec_samples = sample_per_sec * 10 * num_possible_cpus().unwrap_or(1);
log::info!("cache max stack samples: {}", ten_sec_samples);
let (tx, rx) = crossbeam_channel::bounded(ten_sec_samples);其中 crossbeam_channel::bounded(ten_sec_samples); 创建一个有界队列,长度为 ten_sec_samples。
发送端 tx 传给了 thread_poll_trace_event 和 thread_poll_report_event 这两个 callback,分别在两个线程中 poll perf buffer。
接收端 rx 传给了 Probe。最后 livetrace_profiler_read 就是从接收端读取的数据。
37e01d9 to
8191f24
Compare
文档已经补充了 |
|
百炼自动化审查:建议保持开启。 本 PR 为 LoongCollector 新增基于 coolbpf 的 CPU 性能剖析插件(input_cpu_profiling),包含核心 C++ 实现、eBPF 驱动适配、进程发现管理器、单元测试与中文文档。当前处于活跃审查阶段,维护者于 2026-05-12 提出多项代码质量与鲁棒性改进意见,作者已积极回应并修复部分问题,最新提交于 2026-05-18 推送。PR 当前可合并,需保持开启以待审查完成与合并。. 最佳落地路径: 建议作者继续跟进维护者 2026-05-12 的剩余审查意见(如 parseStackCnt 解析鲁棒性增强、Poll 空 PID 尾部数据 drain 逻辑、/proc 扫描频率优化及补充缺失单元测试),待 CI 通过且审查批准后由维护者合并。. 已核对内容:
百炼审查备注:模型 qwen3.6-max-preview;对照提交 b42ea269f84a。 |
| StringView symbolView(symbol); | ||
| StringViewSplitter splitter(symbolView, "\n"); | ||
| for (const auto& line : splitter) { | ||
| auto pos1 = line.find(';'); |
There was a problem hiding this comment.
- 问题:
parseStackCnt用第一个;分割<comm>:<pid>与栈,若 comm 含;则全行解析错误 - 影响:
/proc/*/comm可含特殊字符(虽少见),解析错位会导致栈数据错乱或静默丢弃。 - 建议: 增强解析鲁棒性——从右侧匹配
:<pid>;,或在注释/文档中明确 coolbpf 的格式契约并加校验。
54222e9 to
9f1231f
Compare
9f1231f to
9b0716c
Compare
eef380d to
dd8648e
Compare
No description provided.